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(57) A computer system comprising a data file hav- 
ing entries each of which is designed to hold data, an 
advanced and a completed mapping file each having 
entries each of which is designed to hold a data-file-en- 
try address, an operation window that is a buffer to hold 
substances of operations waiting execution, and a state- 
modification queue that is designed to be able to hold a 
substance of a modification on the advanced mapping 
file for each clock cycle; wherein making a modification 



on the advanced mapping file, entering the substance 
of this modification into the state-modification queue, 
and entering substances of operations into the opera- 
tion window are each to be done in one clock cycle, and 
operations held in the operation window are to be exe- 
cuted out of order. The system can attain high perform- 
ance easily and utilize programs described in any ma- 
chine language for traditional register-based/stack- 
based processors. 
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Description 

FIELD OF THE INVENTION 

5 [0001] The present invention relates to a computer system of a novel construction that can attain high performance 
by providing a high degree of instruction-level parallelism. 

DESCRIPTION OF THE PRIOR ART 

10 [0002] So far, versatile computer systems that have ever been put to practical use can be roughly classified as 
register-based machines or stack machines. Stack machines are superior in smallness of program code size and in 
quickness of interruption handling and context switching, though they have difficulty in attaining high performance. On 
the other hand, register-based machines can relatively easily attain high performance, but, in many other aspects, they 
are inferior to stack machines. 

15 [0003] There is no program compatibility between register-based machines and stack machines, and as mentioned 
above, neither are absolutely superior. 

[0004] The object of the present invention is to materialize a computer system that can harmonize itself both with 
the instruction system of register-based machines and with that of stack machines, and that can attain high perform- 
ance. 

20 

SUMMARY OF THE INVENTION 

[0005] The computer system according to the present invention is basically similar to superscalar architectures ca- 
pable of out-of-order execution, though it adopts an instruction format of large bit length, in which substances of oper- 
as ations and a substance of a state modification to be made on the virtual logical registers / operand stack are to be 
specified. 

[0006] The computer system according to the present invention comprises a data file having entries each of which 
is designed to hold data, a free list that is designed to hold unallocated-data-file-entry addresses, an advanced and a 
completed mapping file each having entries each of which is designed to hold a data-file-entry address, an operation 
30 window that is a buffer to hold substances of operations waiting execution, a state-modification queue that is designed 
to hold substances of state modifications on mapping file indicated in instructions, and functional units for executing 
operations of respective kinds. 

[0007] The computer system of the present invention is so structured as to, in issuing one instruction per clock cycle, 
enter substances of the operations into the operation window, make a modification on the advanced mapping file, and 
35 enter the substance of this modification into the state-modification queue. In the case of an operation to generate result 
data, a data-file entry is newly allocated to hold the result data, and this makes the destination register. Besides, what 
makes each source register of operations is either a data-file entry whose address is obtained by accessing the ad- 
vanced mapping file, or a data-file entry that is being allocated as destination register of an operation contained in the 
same instruction. 

40 [0008] Each unexecuted operation held in the operation window is to be executed out of order by one of the functional 
units. 

[0009] After each operation derived from the same instruction that produced the substance of the head part of the 
state-modification queue is terminated normally, in accordance with the substance of the head part of the state-mod- 
ification queue, contents of the completed mapping file is to be modified so as to reproduce the modification that was 
45 made on the advanced mapping file in the course of issue of the instruction, and the head part is to be dequeued from 
the state-modification queue. 

[0010] The computer system of the present invention can operate in either of the two modes: the stack mode and 
the register mode. In the case of processing a program translated from machine language of register-based RISC/ 
CISC processors in conformity to the instruction format for the computer system of the present invention, the system 

so js to operate in the register mode. On the other hand, in the case of processing a program translated from machine 
language for stack-based processors in conformity to the instruction format for the computer system of the present 
invention, the system is to operate in the stack mode. It is easy to transform a plurality of instructions of a register 
machine/ stack machine into an instruction of the computer system of the present invention , so the system can efficiently 
process a wide range of programs. 

55 [001 1] In the register mode, each mapping file is to be used as an equivalent of register-mapping table of a register- 
based superscalar processor. 

[0012] In the stack mode, each mapping file is to be used to hold a kind of stack. Such a state of the operand stack 
as { , word4, word3, word2, wordl } (the right end is the top of the stack) in a traditional stack machine corresponds 
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to such a state of the stack held by mapping file as { , d, c, b, a } (the right end is the top of the stack), with wordl , 

word2, word3 and word4 respectively (to be) held in the data-file entries whose respective addresses are a, b, c and 
d, in the computer system of the present invention. 

5 BRIEF DESCRIPTION OF THE DRAWINGS 

[0013] 

Fig. 1 is a block diagram showing the basic structure of a preferred computer system according to the present 
'0 invention. 

Fig. 2 is a conceptual drawing of the instruction format for a preferred computer system according to the present 
invention. 

Fig. 3 illustrates the structure of an entry of the data file. 

Fig. 4 illustrates the structure of an entry of the later-described operation queue. 
15 Fig. 5 shows the relation among the advanced mapping file, the completed mapping file, two push pointers, the 

bottom pointer, and the fill-flag file, each described later. 

Fig. 6-12 each show contents of the advanced mapping file, the completed mapping file, the state-modification 
queue, the operation queue, the data file, and the free list, at one of seven points in the course of an example 
action in the register mode. 

20 Fig. 13 - 19 each show contents of the advanced mapping file, the completed mapping file, the state-modification 

queue, the operation queue, the data file, and the free list, at one of seven points in the course of an example 
action in the stack mode. 

PREFERRED EMBODIMENTS OF THE INVENTION 

25 

[0014] In the following, a preferred computer system according to the present invention is described. 
(1) Basic Structure 

30 [0015] To begin with, the basic structure of the computer system of this embodiment is described. 

[0016] Fig. 1 is a block diagram of the computer system. Shown in Fig. 1 are an instruction cache 10, a data cache 
1 1 , a data buffer 1 2, an instruction-fetch unit 20, an instruction-issue unit 21 , an advanced mapping file 3a, a completed 
mapping file 3c, a fill-flag file 3f , an advanced mapping history file 3h, a state-modification queue 4, an operation queue 
5, a data file 6, a free list 7, arithmetic logic units 80 and 81 , a branch unit 82, and a load/store unit 83. 

35 [0017] The computer system of the present invention needs to have an operation window, which is a buffer to hold 
substances of operations waiting execution. There are basically two ways to implement the operation window: central- 
ized type and distributed type. The computer system of this embodiment is supposed to have an operation queue of 
centralized type. 

[0018] The computer system of the present invention can operate in either of the two modes: the register mode and 
^0 the stack mode. In the case of processing a program translated from machine language of register-based RISC/CISC 
processors in conformity to the instruction format for the computer system of the present invention, the system is to 
operate in the register mode. On the other hand, in the case of processing a program translated from machine language 
for stack-based processors in conformity to the instruction format for the computer system of the present invention, 
the system is to operate in the stack mode. 
45 [0019] The computer system of the present invention has mapping files whose entries are each designed to hold a 
data-file-entry address, and that are to function in the same way as register-mapping tables of a register-based super- 
scalar processor capable of out-of-order execution in the register mode. In the present invention, the mapping files are 
to function equivalently in the stack mode as well. 

50 (2) Instruction Format 

[0020] Fig. 2 is a conceptual drawing of the instruction format for the computer system of this embodiment. Both for 
the register mode and for the stack mode, the format of each instruction basically consists of a format-information field 
101, an operations field 102, and state-modification field 103. 
55 [0021] Specified in the format-information field (Fl field) are the length of the instruction, the number of operations 
the instruction contains, and the like. 

[0022] Specified in the operations field (Op field) are substances of operations: arithmetic logic operations, branches, 
loads, stores, and the like. In this embodiment, each operation is expressed in the form similar to the instruction format 
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55 



for RISC processors. In this embodiment, each instruction is supposed to contain up to 4 operations (such instructions 
as to contain a branch operation do not necessarily contain 4 operations). 

[0023] Specified in the state-modification field (SM field) is substance of a modification on mapping file, i.e. state 
modification on the virtual registers (in the register mode) / the virtual operand stack (in the stack mode). As described 
later, the register mode differs from the stack mode in format inside the state-modification field. 
[0024] In the case of an operation to generate result data, a data-file entry is to be newly allocated to hold the result 
data. The logical position of this entry is to be indicated by the content of the SM field. 

[0025] The computer system of this embodiment is supposed to be able to issue one instruction in the above-men- 
tioned format per cycle. 

(2-1) Instruction Format for the Register Mode 

[0026] In the following, the instruction format for the register mode is explained. 

[0027] In the computer system of this embodiment, each data-file entry is supposed to be able to hold data that is 
to be held by a register of the virtual register machine. With rO, r1 , r2, ... representing contents of mapping-file entries 
of address 0, 1, 2, ... respectively, data-file entries whose respective addresses are rO, r1, r2, ... are supposed to 
respectively hold data of register R0, R1 , R2, ... of the virtual register machine. 

[0028] For example, a program for the register machine to load data of variables A, B and X from memory into 
registers, compute (A*X+B/X)*2, and store the result into memory as variable Y can be written as follows: 



LOAD R1, 100[R0]; 
LOAD R4, 400[R0]; 
25 MULR5, R1.R4; 

LOAD R2, 200[RO] ; 
DIV R4, R2, R4 ; 
ADD R5, R5, R4 ; 



load the value at address {the value in R0 +100} into register R1 
load the value at address {the value in R0 +400} into register R4 
R5=R1*R4 

load the value at address {the value in RO+200} into register R2 
R4 = R2/R4 
R5 = R5+R4 



MUL R5, R5, 2 ; R5 = R5*2 

35 STORE 500[R0], R5 ; store the value in register R5 at address {the value in R0 +500} 

where, the addresses of variables A, B, X and Y in memory are 100, 200, 400 and 500 added to the value in R0, 
respectively. 

[0029] Now, how to transform the above program into instructions for the register mode of the computer system of 
40 this embodiment is explained. 

[0030] Partitioned into groups each consisting of 4 instructions, with substance of operation and that of modification 
to be made on mapping file expressed separately for each instruction, the above program is transformed as follows: 





Instruction^ a 


loadfl, 100[r0] 


; 1:f1 


45 


Instruction^ b 


load f2, 400[r0] 


;4:f2 




Instruction^ c 


mulf3, f1,f2 


;5:f3 




Instruction^ d 


load f4, 200[r0] 


;2:f4 




lnstruction_2a 


divf1,r2,r4 


; 4:f1 


50 


lnstruction_2b 


add f2, r5, f1 


;5:f2 




lnstruction_2c 


mul f3, f2, 2 


;5:f3 




lnstruction_2d 


store 500[r0], f3 





where, addresses of the data-file entries to be newly allocated to hold result data are represented by f1 , f2, ... in order 
for each group of instructions. In this embodiment, f1 , f2, ... correspond to contents that are to be taken out of the free 
list in each cycle. 

[0031 ] Instruction,! a implies to load the value at address {1 00 added to the value in the data-file entry corresponding 
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to rO}, write it into the data-file entry corresponding to f 1 , and replace the content of the mapping-file entry of address 
1 with the address of the data-file entry corresponding to f1 . Instructional c implies to multiply the value in the data-file 
entry corresponding to f 1 by that corresponding to f2, write the result data into the data-file entry corresponding to f3, 
and replace the content of the mapping-file entry of address 5 with the address of the data-file entry corresponding to 
f3. The other instructions imply likewise. 

[0032] With four instructions put together, and with substances of operations and those of modifications to be made 
on mapping file expressed separately, for each of the two groups of instructions, the above program is transformed as 
follows: 



w 



Op{load f 1 , 1 00[r0]; load f2, 400[r0]; mul f3, f 1 , f2; load f4, 200[r0]} 
SM{1:f1,2:f4, 4:f2, 5:f3} 

Op{div f 1 , r2, r4; add f2, r5, f 1 ; mul f3, f2 t 2; store 500[r0], f3} 
SM{4:f1,5:f3). 



15 



20 



[0033] These are the contents of the Op and SM fields in the case where the above program is transformed to 
conform to the instruction format for the register mode of the computer system of this embodiment. In transforming 
into the second instruction, though there were two alterations on the mapping-file entry of address 5, only the last one 
is held in the SM field. As above, as for the instruction format for the register mode, combinations of signs, each standing 
for address of a mapping-file entry on which an alteration is to be made and stuff to write, are to be listed in the SM field. 
[0034] Here, f 1 , f2, f3, ... appearing in operations in the Op field as substitutes for destination registers are to be 
mechanically allotted in order, so they could be omitted. In this embodiment, however, they are supposed to be explicitly 
indicated. 



25 
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(2-2) Instruction Format for the Stack Mode 

[0035] Next, the instruction format for the stack mode is explained. 

[0036] In the computer system of this embodiment, each data-file entry is supposed to be able to hold data that is 
to be held by an entry of the operand stack of the virtual stack machine. Namely, with s3, s2, s1 representing contents 
of the stack held by mapping file (the right end is the top of the stack), the operand stack of the virtual stack machine 
is supposed to be constructed by lining up data that are to be held in data-file entries whose respective addresses 
are s3, s2, s1 in order. 

[0037] For example, a program for the stack machine to compute A*X+B/X, and store the result into memory as 
variable Y can be written as follows: 



35 



40 



45 



50 



55 



LOAD <4> ; push the value of local variable <4> onto the operand stack 

DUP ; duplicate the top word on the stack, and push the duplicate onto the stack 

LOAD <1 > ; push the value of local variable <1 > onto the operand stack 

MUL ; pop two words from the stack, multiply them, and push the result onto the stack 

SWAP ; swap top two words on the operand stack 

LOAD <2>; push the value of local variable <2> onto the operand stack 

SWAP ; swap top two words on the operand stack 

DIV ; pop two words from the stack, divide by the top word, and push the result onto the stack 

ADD ; pop two words from the stack, add them, and push the result onto the stack 

STORE <5> ; store the value on the top of the stack into local variable <5> 

where, the storage areas of variables A, B, X and Y are local variables <1>, <2>, <4> and <5>, respectively. 
[0038] Now, how to transform the above program into instructions for the stack mode of the computer system of this 
embodiment is explained. 

[0039] Partitioned into two groups of instructions, with substance of operation and that of modification to be made 
on mapping file expressed separately for each instruction, the above program is transformed as follows: 



Instruction, 


_1a 


load f 1 , <4> 


;+1: 


f1 


Instruction. 


_1b 


addf2,f1,0 


;+2: 


f1,f2 


Instruction, 


_1c 


load f3, <1> 


; +3: 


f 1 , f2, f3 


Instruction. 


_1d 


mul f4, f2, f3 


;+2: 


f1,f4 


Instruction. 


_1e 


(swap) 


;+2: 


f4, f1 
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(continued) 



lnstruction_2a 


loadf1,<2> 


;+1:f1 


lnstruction_2b 


(swap) 


;+1:f1,s1 


lnstruction_2c 


divf2, f1,s1 


;0: 12 


lnstruction_2d 


add f3, s2, f2 


;-1:f3 


lnstruction_2e 


store <5>, f3 


;-2: 



where, in the same way as for the register mode, addresses of the data-file entries to be newly allocated to hold result 
data are represented by f1 , f2 t ... in order for each group of instructions. Besides, as for each modification to be made 
on mapping file, the number of entries by which the stack held by mapping file is to grow and stuff to write in the vicinity 
of the top of the stack are indicated. 

[0040] Instructional a implies to load the value of local variable <4>, write it into the data-file entry corresponding to 
f 1 , grow the stack held by mapping file by one entry, and make content of the new top-of-the-stack entry to be the 
address of the data-file entry corresponding to f 1 . Instruction^ b implies likewise, however, the modification to be made 
on mapping file is expressed with that to be made by Instruction^ a involved. The other instructions imply likewise. 
[0041] With instructions put together, and with substances of operations and those of modifications to be made on 
mapping file expressed separately, for each of the two groups of instructions, the above program is transformed as 
follows: 



Op{ load f1 , <4>; add f2, f 1 , 0; load f3, <1 >; mul f4, f2, f3 } 
SM{ +2: f4, f1 } 

Op{ load f1 , <2>; div f2, f1 , s1 ; add f3, s2, f2; store <5>, f3 } 
SM{ -2: } 

[0042] These are the contents of the Op and SM fields in the case where the above program is transformed to 
conform to the instruction format for the stack mode of the computer system of this embodiment. As above, as for the 
instruction format for the stack mode, the amount of stack growth and a series of signs standing for post-modification 
stuff in the vicinity of the top of the stack are to be specified in the SM field. 

[0043] Here, as stated for the register mode, f 1 , f2, f3, ... appearing in operations in the Op field as substitutes for 
destination registers are to be mechanically allotted in order, so they could be omitted. In this embodiment, however, 
they are supposed to be explicitly indicated. 

[0044] So far, how programs for a register machine / stack machine are transformed to conform to the instruction 
format for the computer system of this embodiment has been explained. A program to be processed by the computer 
system of the present invention, however, need not necessarily be a one into which a program for a register machine 
/ stack machine is transformed. 

[0045] For example, a program for the stack mode to compute A+(B/2)*X+(C/3)*X*X, store the result into memory 
as variable Y, and pile the values of factors A, B/2 and C/3 in order on the operand stack may be such as follows: 



Op{load f1 , 8[s2]; load f2, 16[s2]; div f3, f2, 2; load f4, 24[s2]} 
SM{+3:f1,f3, f4} 

Op{div f 1 , s1 , 3; load f2, 200[s4]; mul f3, f2, f2; mul f4, s2, f2} 
SM{+2: f 1 , f4, f3} 

Op{mul f 1 , s3, s1 ; add f2, s5, s2; add f3, f2, f1 ; store 400[s6], f3} 
SM{-2:} 



where, the addresses of variables A, B and C in memory are respectively 8, 16 and 24 added to the value in the entry 
under the top of the initial virtual operand stack, and the addresses of variables X and Y are respectively 200 and 400 
added to the value in the entry at the top of the initial virtual operand stack. 



(3) Register Mode 

(3-1) Functions and Structures Required for the Action in the Register Mode 



[0046] Here, the function and structure of each component of the computer system of this embodiment used for the 
action in the register mode are described. They are the functions and structures needed in the stack mode as well. 
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(A) Instruction-Fetch unit 

[0047] The instruction-fetch unit is furnished with a program counter, which is not shown in the figures, and so struc- 
tured as to fetch instructions from the instruction cache and deliver them to the instruction-issue unit. It also carries 
5 out prediction and execution of branches. 

(B) Instruction-Issue Unit 

[0048] The instruction-issue unit is so structured as to, in issuing each instruction delivered from the instruction-fetch 
io unit, generate various signals to modify the advanced mapping file, the data file, etc., and enter substances into the 
operation queue and the state-modification queue. 

(C) Mapping files 

is [0049] A mapping file is so structured as to have entries, each of which can hold a data-file-entry address. In the 
register mode, each mapping file is to be used as an equivalent of register-mapping table of register-based superscalar 
processors. 

[0050] The computer system of the present invention is furnished with two mapping files: an advanced mapping file 
(AMF) and a completed mapping file (CMF). These two mapping files are of the same shape, and their respective 
20 entries correspond to each other. 

[0051] In the computer system of the present invention, each time an instruction is issued, a modification according 
with the instruction is made on the advanced mapping file (hereafter, it may be referred to as the AMF). Namely, the 
advanced mapping file is to reflect substances of all the instructions that have been issued. 

[0052] On the other hand, the completed mapping file (hereafter, it may be referred to as the CMF) is to reflect 
25 substances of all the instructions that have been completed in program-sequential order. The computer system of the 
present invention is capable of out-of-order execution of operations contained in the issued instructions. And, for se- 
curing precise handling of exceptions, the completed mapping file enables the system to construct the state grounded 
on all the instructions completed in order. 

[0053] The computer system of this embodiment is furnished with an advanced mapping history file (hereafter, it may 
30 be referred to as the AMF history file) for incarnation of speculative execution based on branch prediction. Each entry 
of the AMF history file is designed to be able to hold contents of all the AMF entries and later-described register PP-A. 

(D) Data file (DF) 

35 [0054] The data file (hereafter, it may be referred to as the DF) comprises physical registers: POO, P01 , P02, .... 
[0055] Fig. 3 illustrates the structure of entry 6(i) of the DF 6 in the computer system of this embodiment. Here, "i" 
stands for entry address. Each entry 6(i) of the DF 6 comprises data field 61 (i) and write-completion-flag (WCF) field 
62(i). 

[0056] As for the hardware implementation, the DF is practically made of register files that respectively correspond 
40 to the fields listed above. 

[0057] The data field of each DF entry is designed to hold a word of data. 

[0058] In each DF entry, the WCF field is supposed to hold "1 " in the case where data is already written in the data 
field, and hold "0" otherwise. 

45 (E) Free List (FL) 

[0059] The free list (hereafter, it may be referred to as the FL) is to hold addresses of free, namely, unallocated DF 
entries. In this embodiment, the FL is constructed as a circular FIFO queue. 

[0060] In the initialized state, the addresses of all DF entries are registered on the FL. In the case where a DF entry 
50 needs to be newly allocated, a free-DF-entry address is to be taken out of the FL. Conversely, if a DF entry is released 
from allocation, the address of this entry is to be registered on the FL. 

(F) Operation queue (OQ) 

55 [0061 ] The operation queue (hereafter, it may be referred to as the OQ) is a buffer to hold substances of operations 
contained in instructions that are already issued but not yet completed, and is constructed as a circular FIFO queue. 
[0062] Fig. 4 illustrates the structure of entry 5(i) of the OQ 5 in the computer system of this embodiment. Here, "i" 
stands for entry address. Each entry 5(i) of the OQ 5 comprises operation field 50(i), destination-register field 51 (i), 
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first-operand field 52(i), first-source-register field 53(i), second-operand field 54(i), second-source-register field 55(i), 
instruction-border (IB) field 56(i), branch-tag (BT) field 57(i) and state (S) field 58(i). 
[0063] The operation field of each OQ entry is designed to hold an operation code. 

[0064] The destination-register field of each OQ entry is designed to hold an address of DF entry allocated as des- 
s tination register of the operation. 

[0065] Each of the first- and second-operand fields of each OQ entry is designed to hold an operand value required 
for the operation. 

[0066] Each of the first- and second-source-register fields of each OQ entry is designed to hold an address of DF 

entry allocated as source register of the operation. 
10 [0067] In each OQ entry, the IB field is supposed to hold "1 " in the case where the entry was at the tail of the queue 

when the substance of operation, contained in an issued instruction, was entered into the OQ, and hold "0" otherwise. 

[0068] The BT field of each OQ entry is concerned with speculative execution based on branch prediction. In this 

embodiment, as described later, there is a certain relationship between branch tags to be entered in BT fields and 

entry addresses in the AMF history file. 
15 [0069] In each OQ entry, indicated in the S field is "unexecuted", "normal termination", "occurrence of exception" or 

the like according to the state of the operation held in the entry. 

[0070] The OQ is furnished with comparators for each source-register field of each entry, and has associative func- 
tions, by which the content of each source- register field is compared with DF-entry addresses and data is written into 
the operand field corresponding to each matched source-register field. 

20 

(G) State-Modification Queue (SMQ) 

[0071] The state-modification queue (hereafter, it may be referred to as the SMQ) is a buffer to hoid substances of 
modifications on mapping file indicated in instructions that are already issued but not yet completed, and is constructed 
25 as a circular FIFO queue. In this embodiment, each entry of the SMQ is to correspond to the SM field of an instruction. 

(H) Functional Units 

[0072] As shown in Fig. 1 , the computer system of this embodiment is furnished with four functional units: arithmetic 
30 logic unit 0 and 1 , a branch unit and a load/store unit (hereafter, it may be referred to as the LSU). These are to operate 
in parallel independently of each other. 

(3-2) Action in the Register Mode 

35 [0073] Next, the action in the register mode of the computer system of this embodiment is described. 

[0074] The computer system of this embodiment processes instructions through four stages: [1] instruction-fetch, 
[2] instruction-issue, [3] operations-execution, and [4] instruction-completion stage. In the following, the action in each 
stage is described. 

40 [1] Instruction-Fetch Stage 

[0075] In this stage, the instruction-fetch unit fetches an instruction from the instruction cache, and determines the 
address of the instruction to fetch next. Ordinarily, it is from the next address to fetch next. But, in the case where the 
fetched instruction includes either an unconditional branch operation or a conditional branch operation with a prediction 
45 of taking a branch, the case where a branch prediction turns out to have missed, or the case where an exception 
occurs, the address of the instruction to fetch next is changed. 

[2] Instruction-Issue Stage 

so [0076] In this stage, substances of operations according with the contents of the Op field of the instruction to be 
issued are entered into the operation queue (OQ), a modification according with the contents of the SM field is made 
on the advanced mapping file (AMF), and the substance of this modification is entered into the state-modification queue 
(SMQ). At this juncture, rO, M, r2, ... in the instruction are respectively replaced by p re-alteration contents of AMF 
entries of address 0, 1 , 2, and f1 , f2, ... are respectively replaced by DF-entry addresses that are taken out of the 

55 free list (FL) in order. 

[0077] In the IB field of each OQ entry into which substance of an operation contained in the instruction is being 
entered, "1" is entered for the OQ entry getting to be at the tail of the queue, and "0" is entered for those otherwise. In 
the BT field of each of the OQ entries, the branch tag forwarded from the instruction-issue unit is entered. 
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[0078] In the WCF field of each of the DF entries newly allocated In the course of Issue of the instruction (in respective 
correspondence with f1.f2.f3,...), "0" is entered. 

[0079] In the case of a load/store operation, its substance is not only entered into the OQ, but also, together with 
the address of the OQ entry into which the substance is being entered, delivered to the LSU. 
5 [0080] As for each writing of a DF-entry address replacing any of rO, r1 , r2, ... into a source-register field on the OQ, 
in the next cycle, contents of the DF entry of that address will be read out. In the case of the WCF-field content being 
"1", the data, together with the DF-entry address, will be conveyed to the OQ. Then, on the OQ, each source-register- 
field content will be compared with the conveyed DF-entry address, and the data will be written into the operand field 
(s) corresponding to the matched source-register field(s). 

w 

[3] Operations-Execution Stage 

[0081] Each of unexecuted operations held in the OQ is to be executed after becoming executable. Therefore, op- 
erations are executed out of order. 

'5 [0082] If there is an OQ entry where each required source data is written in the appropriate operand field, and if a 
functional unit that has the function of executing the operation held in that OQ entry is available, the substance of the 
entry, together with the entry address, is to be transmitted to the functional unit, and to be processed. 
[0083] As for each operation to generate result data, such as load and arithmetic/logic operations, if result data is 
obtained normally, it is to be written into the data field of the destination DF entry, and the WCF field is to be altered 

20 to "1". Besides, on the OQ, each source-register-field content is to be compared with the address of the above desti- 
nation DF entry, and the result data is to be written into the operand field(s) corresponding to the matched source- 
register field(s), if any. 

[0084] As for each OQ entry holding a substance of a load/store operation, the same substance was delivered to 
the LSU in the instruction-issue stage. In this embodiment, each operand data required for address calculation that 
25 was not arranged in the instruction-issue stage is supposed to be transmitted to the LSU right after entered into the 
OQ entry. 

[0085] The LSU executes load operations out of order. As for store operations, the LSU will perform stores in the 
instruction-completion stage so that precise handling of exceptions may be secured, though it performs address cal- 
culations out of order in the operations-execution stage. Therefore, in the case of a store operation, when the store 
30 data and the store address are both arranged, it is taken for terminated normally as action in the operations-execution 
stage. 

[0086] If an operation is terminated normally, the S field of the OQ entry holding the substance of this operation is 
to be altered to "normal termination". 

[0087] In the case where an exception occurs in executing an operation, this piece of information is to be written into 
35 the S field of the OQ entry holding the substance of that operation, and to be communicated to the instruction-fetch unit. 

[4] Instruction-Completion Stage 

[0088] Instructions are completed in program-sequential order. 

40 [0089] On the OQ, the entries ranging from the head one of the queue to the first one whose IB-field content is "1" 
hold the whole substances of the operations contained in the instruction to be completed next. If the S-f ield content is/ 
becomes "normal termination" in each of theOQ entries in the above range, the instruction gets ready to be completed. 
To complete the instruction, the system dequeues the OQ entries in the above range, makes a modification according 
with the substance of the head entry of the state-modification queue (SMQ) on the CMF, and dequeues this head entry. 

45 Here, the modification that was made on the AMF in the course of issue of the instruction is being reproduced on the 
CMF. 

[0090] Besides, in accordance with completion of the instruction, each address of DF entry to be released from 
allocation is to be registered on the FL. In the register mode, a set of addresses of DF entries to be released from 
allocation consists of the following two groups: the one(s) each being pre-alteration content of a CMF entry to be altered 

50 in the course of completion of the instruction, and the one(s) each being a destination-register-field content in the OQ 
entries to be dequeued that is not contained in the SMQ entry to be dequeued - not to be written in the CMF -. The 
computer system of the present invention could be so structured that registration of addresses of DF entries to be 
released from allocation onto the FL is to be carried out after the completion of the instruction. In this embodiment, 
however, it is supposed to be carried out at the same time as the completion of the instruction. 

55 [0091] In the case of completing an instruction that includes a store operation, the system requests the LSU to 
perform the store. Thus, data are to be stored in program-sequential order for sure. 

[0092] In the case where content of the S field is/becomes "occurrence of exception" in an OQ entry holding substance 
of an operation contained in the instruction to be completed next, as the virtual state at the point of the occurrence of 
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exception on the assumption that the program was being executed in order can be constructed by means of the CMF 
and the DF at this time, precise handling of exceptions can be materialized. To cancel all the instructions that have 
been issued after the instruction that caused an exception, the system puts each address of DF entry allocated as 
destination register in the operations to be canceled back on the FL, and dequeues entries in the range that corresponds 
5 to the instructions to be canceled both on the OQ and on the SMQ. 

[0093] The above is the overall action in the register mode of the computer system of this embodiment. 

(3-3) Example Action in the Register Mode 

10 [0094] Next, an example action is described. Now, let's consider processing, with the computer system of this em- 
bodiment, the above-mentioned program to compute Y=(A*X+B/X)*2 consisting of the following two instructions: 



Instruction..! 



[0095] 



Op{load f1 , 100[r0]; load f2, 400[r0]; mul f3, f1 , f2; load f4, 200[r0]} 
SM{1:f1,2:f4, 4:f2, 5:f3} 



20 Instruction^ 



[0096] 

Op{div f 1 , r2, r4; add f2, r5, f 1 ; mul f3, f2, 2; store 500[r0], f3} 
25 SM{4:f 1 , 5:f3}. 

[0097] Fig. 6 • 1 2 show changes in the state of the computer system of this embodiment in the course of processing 
of the above program in order of time. With reference to these figures, detailed action is described below. In Fig. 6 - 
12, the structure of each entry of the DF 6 / OQ 5 is the same as in Fig. 3 / Fig. 4. 

30 [0098] Let us suppose that contents of the FL / AMF are such as those shown in Fig. 6 at the outset. Here, as for 
the FL 7, which is constructed as a circular FIFO queue, it is shown that the four entries from the head one of the queue 
downward respectively correspond to f1 , f2, f3 and f4 for the instruction to be issued next. On the AMF 3a / CMF 3c, 
the entries are supposed to be tagged with address 0, 1, 2, ... from the top downward, and to correspond to rO, r1 , 
r2, respectively. As for places in which p## is entered, though a DF-entry address is written, you need not pay 

35 attention in the explanation of this example action. 

[0099] Let us suppose that Instruction^ is to be issued when the computer system is in such a state as shown in 
Fig. 6. 

[0100] Substances of operations according with the contents of the Op field of instruction.! are entered into the OQ, 
a modification according with the contents of the SM field is made on the AMF, and the substance of this modification 

40 is entered into the SMQ. At this juncture, each rO in the instruction is replaced by p12, which is the pre-alteration (Fig. 
6) content of the AMF entry of address 0, and f1 , f2, f3 and f4 are respectively replaced by p26, p34, p42 and p51 , 
which are taken out of the free list in order. In regard to the modification on the AMF, contents of entries corresponding 
to r1 , r2, r4 and r5 are altered to p26, p51 , p34 and p42, which replace f 1 , f4, f2 and f3, respectively. As for the IB field 
of each of the four OQ entries whose contents are shown in Fig. 7, "I" is entered for the bottom OQ entry as it is at 

45 the tail of the queue at this time, and "0" is entered for the other entries. Besides, in the S field of each of these four 
OQ entries, "0" is entered, which stands for "unexecuted". 

[0101] In the WCF field of each of the newly allocated DF entries whose respective addresses are p26, p34, p42 
and p51 - respectively corresponding to f1 , f2, f3'and f4 -, "0" is entered. (It is supposed that, in the DF entry of address 
p12 corresponding to rO, the WCF-field content is "1" and data "1000" is already written.) 
so [0102] Besides, in this example action, from beginning to end, as branch tag, "0" is supposed to be forwarded from 
the instruction-issue unit, and to be written in the BT field of each OQ entry into which substance of an operation is 
entered. 

[0103] And moreover, as Instruction^ includes 3 load operations, their substances are not only entered into the OQ, 
but also delivered to the LSU. 

55 [0104] Thus, the computer system in such a state as shown in Fig. 6 gets to be in such a state as shown in Fig. 7. 
[0105] In the next cycle, following Instruction^ , lnstruction_2 is to be issued. 

[0106] In the same way as in the case of Instructional , substances of operations according with the contents of the 
Op field of instruction_2 are entered into the OQ, a modification according with the contents of the SM field is made 
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on the AMF, and the substance of this modification is entered into the SMQ. 

[0107] In the WCF field of each of the newly allocated DF entries whose respective addresses are p1 6, p1 8 and p57 

- respectively corresponding to f 1 , f2 and f3 "0" is entered. 

[0108] And moreover, as lnstruction_2 includes a store operation, its substance is not only entered into the OQ, but 

5 also delivered to the LSU. 

[0109] Besides, since p1 2, which replaced rO, was written into source-register fields on the OQ in the course of issue 
of Instruction^ in the preceding cycle, contents of the corresponding DF entry are read out. As the WCF-field content 
turns out to be "I", read-out data "1000'\ together with DF-entry address "p12", is conveyed to the OQ. Then, on the 
OQ, by means of the associative function, data "lOOO" is written into the operand fields that respectively correspond 

10 to the source-register fields holding n p12". 

[0110] Thus, the computer system gets to be in such a state as shown in Fig. 8. 

[01 11] Unexecuted operations held in the OQ are to be executed out of order - each after becoming executable. 
[01 12] Fig. 9 shows the state that the computer system is in, at the point - after some cycles have gone by - when 
all the instructions issued before Instruction^ have been completed and all the four operations contained in 

'5 Instruction.! have been terminated normally. As for each of the four OQ entries holding the substances of the operations 
contained in Instruction^ , each required operand data is already written in the operand field, the S-field content is "1", 
which stands for normal termination, and in the DF entry indicated in the destination-register field, data is already 
written and WCF-field content is "1 ". And, contents of the CMF at this point are the same as contents of the AMF right 
before issuing of Instruction^ (Fig. 6). The system in this state is to complete Instruction^ in the following manner. 

20 [0113] As the top one of the eight OQ entries whose contents are shown in Fig. 9 is at the head of the queue at this 
point, four OQ entries ranging from that entry to the fourth entry, whose IB-field content is "1", are dequeued. As the 
top one of the two SMQ entries whose contents are shown in Fig. 9 is at the head of the queue at this point, a modification 
according with its substance is made on the CMF. Namely, contents of the CMF entries whose respective addresses 
are 1 , 2, 4 and 5 are altered to p26, p51 , p34 and p42, respectively. And moreover, the said head entry is dequeued 

25 on the SMQ. 

[0114] Besides, in accordance with completion of Instruction^ , each address of DF entry to be released from allo- 
cation is enqueued on the FL. In this case, pre-alteration contents of the CMF entries whose respective addresses are 
1, 2, 4 and 5 - p02, p10, p24 and p60, which are being altered - are read out and registered. (In this case, there is no 
destination-register-field content in the dequeued OQ entries that is not contained in the dequeued SMQ entry.) Thus, 
30 the computer system gets to be in such a state as shown in Fig. 1 0. 

[0115] Fig. 11 shows the state that the computer system is in, at the point - after some further cycles have gone by 

- when three operations contained in Instruction^ have been terminated normally and the store data and the store 
address have been both arranged for the last (store) operation. 

[0116] The system at the point of Fig. 11 is to complete lnstruction_2 in a cycle in the same manner as in the case 
35 of Instruction^ . In this case, lnstruction_2 includes a store operation, so the system requests the LSU to perform the 
store. Besides, in accordance with completion of the instruction, each address of DF entry to be released from allocation 
is enqueued on the FL. In this case, p34 and p42 - pre-alteration contents of CMF entries that are being altered -, and 
p18 - a destination-register-field content in the dequeued OQ entries that is not contained in the dequeued SMQ entry 

- are registered on the FL. Thus, the computer system gets to be in such a state as shown in Fig. 12. 
40 [01 17] Now, computation of Y=(A*X+B/X)*2 is concluded in the computer system of this embodiment. 

(4) Stack Mode 

(4-1) Functions and Structures Required for the Action in the Stack Mode 

45 

[01 18] The functions and structures described in section (3-1) are needed in the stack mode as well. Here, functions 
and structures required additionally for the action in the stack mode are to be described. These are mainly concerned 
with the mapping files and the load/store unit (LSU). (C+) Mapping files and Fill-Flag File (FFF) 
[0119] In the stack mode of the computer system of this embodiment, a mapping file and the data file (DF) are 

50 supposed to virtually construct an uppermost part of the operand stack. Namely, in the stack mode, each mapping file 

is to be used to hold a kind of stack. Such a state of the operand stack as { , word4, word3, word2, wordl } (the 

right end is the top of the stack) in a traditional stack machine corresponds to such a state of the stack held by mapping 

file as { , d, c, b, a } (the right end is the top of the stack), with wordl , word2, word3 and word4 respectively (to be) 

held in the data-file entries whose respective addresses are a, b, c and d, in the computer system of the present 

55 invention. When the operand stack grows so long, the remaining part is to be stored in the data buffer and the data cache. 
[0120] Each mapping file can function as a circular buffer. For this, they have two kinds of registers: push pointer 
and bottom pointer. The push pointer indicates the entry over the top one of the stack held by the mapping file. The 
bottom pointer indicates the bottom entry of the stack held by the mapping file. You can know how many entries are 



11 



EP1 324 192 A1 



vacant in the mapping file by subtracting the value of the push pointer from that of the bottom pointer In the initialized 
state, the value of the push pointer and that of the bottom pointer are both set to be zero. 

[0121] Besides, the computer system of this embodiment is furnished with a fill-flag file (FFF) having entries - provided 
in respective correspondence with the AMF/CMF entries - each of which is designed to hold one bit of information. 
5 Each FFF entry is supposed to hold "1 " in the case where the DF entry indicated in the corresponding AMF/CMF entry 
holds a value filled from the data buffer, and hold "0" otherwise. 

[0122] Fig. 5 shows the relation among two sets of mapping files and push pointers, the bottom pointer and the fill- 
flag file in the computer system of this embodiment. Two mapping files, the AMF 3a and the CMF 3c, and the fill-flag 
file (FFF) 3f have the same number of entries. And, on each mapping file, the entries are supposed to be tagged with 
10 address 0, 1,2, ... from the top downward. Each of the shaded entries of the AMF/CMF is supposed to hold a DF-entry 
address. As shown in Fig. 5, the AMF and the CMF are each furnished with a push pointer - respectively named PP-A 
and PP-C. On the other hand, only one bottom pointer is provided, which is shared between the AMF and the CMF. 
This is named BP. In the figures, the stack held by the AMF/CMF is supposed to grow downward. (H-3+) Load/Store 
Unit (LSU) and Data Buffer 

15 [0123] The data buffer is a circular buffer having entries each of which is designed to hold a word of data. 

[0124] In the stack mode, the LSU is so structured as to be able to access the data buffer and the data cache. 
[0125] In the stack mode of the computer system of this embodiment, while uppermost part of the operand stack is 
to be virtually constructed by means of a mapping file and the data file (DF), lower and even lower part are to be stored 
in the data buffer and in the data cache, respectively. As the LSU is supposed to be able to quickly access the data 

20 buffer, the larger proportion of variables to access the data buffer retains, the more efficiently the computation can 
proceed. And, by letting the data buffer retain an appropriate number of words of data, later-described spill/fill operations 
between the DF, the data buffer and the data cache can be streamlined. 

[0126] The LSU is furnished with a register to hold a pointer to first local variable: vars register, which is not shown 
in the figures. In the computer system of this embodiment, though the storage area of the first local variable is either 

25 jn the data buffer or in the data cache, the vars register is to hold the corresponding address in the data cache. Namely, 
even if all or part of the local variables are practically held in the data buffer, assuming that all the local variables were 
spilt into the data cache, you can associate each local variable with an address in the data cache. In processing a load/ 
store operation, the LSU may perform an address calculation referring to the value of the vars register, judge whether 
the storage area of the object local variable is in the data buffer or in the data cache, and access the storage area. 

30 [0127] The LSU is so structured as not only to process load/store operations specified in the program, but also to 
automaticaily perform spills/ fills between the data buffer and the bottom of the stack constructed by means of the AM F/ 
CMF and the DF in order to evade overflows/underflows. 

[0128] A word of data can be spilt from the DF into the data buffer, when the following two conditions are satisfied 
(otherwise, the spill is to be deferred): (a) an identical content is held in the entry indicated by BP both in the AMF and 

35 in the CMF; and (b) there is no OQ entry in the queue whose source-register field holds the same DF-entry address 
as said identical content. The system could be so structured that condition (b) is always satisfied, by introducing certain 
restrictions on cases in which an address of DF entry used as source register of an operation included in an instruction 
is to remain in mapping file. If such restrictions are not introduced into the instruction system, the computer system 
needs to be so structured as to have the following function: comparing the content of each source-register field on the 

40 OQ with the same DF-entry address as said identical content, and checking in regard to condition (b). In the case 
where the above two conditions are satisfied, a word of data held in the DF entry indicated by the content of the AMF/ 
CMF entry indicated by BP can be spilt into the data buffer. At this juncture, the value of BP is to be increased by one, 
and the address of said DF entry is to be registered on the FL. 

[0129] Conversely, in filling the DF with a word of data from the data buffer, the system is to take a word of data to 
45 fj|| with out of the data buffer, assign to it a free DF entry, write the word into the data field of this entry, whose WCF- 
field content is set to be "1", decrease the value of BP by one, and write the address of the assigned DF entry into the 
entry indicated by decreased-by-one BP both on the AMF and on the CMF At this juncture, "1" is to be entered into 
the FFF entry corresponding to the AMF/CMF entry into which the address of the DF entry is being written. 
[01 30] Besides, between the data buffer and the data cache, spills and fills are to be performed properly in accordance 
50 with vacancies in the data buffer. 

[0131] The system can be so structured that, with two mapping files (the AMF and the CMF), the data buffer and the 
data cache each divided into interleaved banks, operations similar to the above are to be respectively performed be- 
tween corresponding banks, so that a spill/fill operation of a plurality of words of data can be performed at once between 
the DF, the data buffer and the data cache. In this case, for spill/fill operations, each of the AMF, the CMF, the data 
55 buffer and the data cache needs to have a read/write port for each bank, and the DF needs to have as many read/ 
write ports as the interleaved divisions. 

[0132] The above is the basic functions and structures required additionally for the action in the stack mode 
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(4-2) Action in the Stack Mode 

[0133] Next, the action in the stack mode of the computer system of this embodiment is described. 
[0134] In the stack mode, in the same manner as in the register mode, the computer system of this embodiment 
s processes instructions through four stages: [1 ] instruction -fetch, [2] instruction-issue, [3] operations-execution, and [4] 
instruction-completion stage. In the following, the action in each stage is described. 

[1] Instruction-Fetch Stage 

10 [0135] In this stage, the instruction-fetch unit fetches an instruction from the instruction cache, and determines the 
address of the instruction to fetch next. 

[2] Instruction-Issue Stage 

is [0136] In this stage, substances of operations according with the contents of the Op field of the instruction to be 
issued are entered into the operation queue (OQ), a modification according with the contents of the SM field is made 
on the advanced mapping file (AMF) and its push pointer, PP-A, and the substance of this modification is entered into 
the state-modification queue (SMQ). At this juncture, s1 , s2, s3, ... in the instruction - which respectively correspond 
to top, 2nd, 3rd, ... of the stack - are respectively replaced by pre-alteration contents of corresponding AMF entries, 

20 and f 1 , f2, ... are respectively replaced by DF-entry addresses that are taken out of the free list (FL) in order. 

[01 37] In each of the entries of the fill-flag file (FFF) in the range corresponding to the AMF entries that are undergoing 
the modification - in the vicinity of the new top of the stack M 0" is entered. 

[0138] In the IB field of each OQ entry into which substance of an operation contained in the instruction is being 
entered, "1" is entered for the OQ entry getting to be at the tail of the queue, and "0" is entered for those otherwise. In 
25 the BT field of each of the OQ entries, the branch tag forwarded from the instruction-issue unit is entered. 

[01 39] In the WCF field of each of the DF entries newly allocated in the cou rse of issue of the instruction (in respective 
correspondence with f1 , f2, f3, ...), "0" is entered. 

[0140] In the case of a load/store operation, its substance is not only entered into the OQ, but also, together with 
the address of the OQ entry into which the substance is being entered, delivered to the LSU. 
30 [0141] As for each writing of a DF-entry address replacing any of s1 , s2, s3, ... into a source-register field on the OQ, 
in the next cycle, contents of the DF entry of that address will be read out. In the case of the WCF-field content being 
"1", the data, together with the DF-entry address, will be conveyed to the OQ. Then, on the OQ, each source-register- 
field content will be compared with the conveyed DF-entry address, and the data will be written into the operand field 
(s) corresponding to the matched source-register field(s). 

35 

[3] Operations-Execution Stage 

[0142] Each of unexecuted operations held in the OQ is to be executed after becoming executable in the same 
manner as in the register mode. 

40 

[4] Instruction-Completion Stage 

[0143] Instructions are completed in program-sequential order in the same manner as in the register mode except 
for the following. 

45 [0144] To begin with, it is in conformity with the format inside the SM field of instruction - namely, entry format for the 
state-modification queue (SMQ) - for the stack mode that the modification that was made on the AMF in the course of 
issue of the instruction is reproduced on the CMF. 

[0145] Besides, in accordance with completion of the instruction, each address of DF entry to be released from 
allocation is to be registered on the FL. In the stack mode, a set of addresses of DF entries to be released from allocation 
so consists of the following two groups: the one(s) each being a DF-entry address to be eliminated from the stack by the 
alterations on the CMF and PP-C in the course of completion of the instruction, and the one(s) each being a destination- 
register-field content in the OQ entries to be dequeued that is not contained in the SMQ entry to be dequeued - not to 
be written in the CMF 

[0146] The above is the overall action in the stack mode of the computer system of this embodiment. 

55 

(4-3) Example Action in the Stack Mode 

[0147] Next, an example action is described. Now, let's consider processing, with the computer system of this em- 
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bodiment, the above-mentioned program to compute Y=A*X+B/X consisting of the following two instructions: 
Instruction^ 
5 [0148] 

Op{ load f1 , <4>; add f2, f 1 , 0; load f3, <1 >; mul f4, f2, f3 } 
SM{ +2: f4, f 1 } 

10 lnstruction_2 

[0149] 

Op{ load f 1 , <2>; div f2, f 1 , s1 ; add f3, s2, f2; store <5>, f3 } 
15 SM{ -2: }. 

[01 50] Fig. 1 3 - 1 9 show changes in the state of the computer system of this embodiment in the course of processing 
of the above program in order of time. With reference to these figures, detailed action is described below. In Fig. 13 - 
19, the structure of each entry of the DF 6 / OQ 5 is the same as in Fig. 3 / Fig. 4. 

20 [0151] Let us suppose that contents of the FL / AMF are such as those shown in Fig. 13 at the outset. Here, as for 
the FL 7, which is constructed as a circular FIFO queue, it is shown that the four entries from the head one of the queue 
downward respectively correspond to f 1 , f2, f3 and f4 for the instruction to be issued next. As for the AMF, it is shown 

that the entries constituting the stack from the top downward correspond to s1 , s2 respectively. As for places in 

which p## is entered, though a DF-entry address is written, you need not pay attention in the explanation of this example 

25 action. Besides, in this example action, it is supposed, for the sake of simplicity, that no spill/fill operation between the 
DF and the data buffer is to be performed. So, the value of BP does not fluctuate. 

[0152] Let us suppose that Instructional is to be issued when the computer system is in such a state as shown in 
Fig. 13. 

[0153] Substances of operations according with the contents of the Op field of instruction.! are entered into the OQ, 
30 a modification according with the contents of the SM field is made on the AMF and PP-A, and the substance of this 
modification is entered into the SMQ. At this juncture, f 1 , f2, f3 and f4 in the instruction are respectively replaced by 
p26, p34, p42 and p51 , which are taken out of the free list in order. In regard to the modification on the AMF, p26 and 
p51 , replacing f1 and f4, are entered in the entries at and under the top of the stack grown by two entries, respectively. 
As for the IB field of each of the four OQ entries whose contents are shown in Fig. 14, "1" is entered for the bottom 
35 OQ entry as it is at the tail of the queue at this time, and "0" is entered for the other entries. Besides, in the S field of 
each of these four OQ entries, "0" is entered, which stands for "unexecuted". 

[0154] In the WCF field of each of the newly allocated DF entries whose respective addresses are p26, p34, p42 
and p51 - respectively corresponding to f1 , f2, f3 and f4 -, "0" is entered. 

[0155] Besides, in this example action, from beginning to end, as branch tag, "0" is supposed to be forwarded from 
40 the instruction-issue unit, and to be written in the BT field of each OQ entry into which substance of an operation is 
entered. 

[0156] And moreover, as Instruction^ includes 2 load operations, their substances are not only entered into the OQ, 
but also delivered to the LSU. 

[0157] Thus, the computer system in such a state as shown in Fig. 13 gets to be in such a state as shown in Fig. 14. 

45 [0158] In the next cycle, following Instruction.! , lnstruction_2 is to be issued. 

[0159] In the same way as in the case of Instruction.! , substances of operations according with the contents of the 
Op field of instruction_2 are entered into the OQ, a modification according with the contents of the SM field is made 
on the AMF and PP-A, and the substance of this modification is entered into the SMQ. At this juncture, s1 and s2 in 
the instruction are respectively replaced by p26 and p51 , which are the pre-alteration (Fig. 1 4) contents of the corre- 

50 sponding AMF entries, and f 1 , f2 and f3 are respectively replaced by p1 6, p1 8 and p57, which are taken out of the free 
list in order. As only a negative growth of the stack is indicated in the SM field, no alteration is required on the AMF. 
[0160] In the WCF field of each of the newly allocated DF entries whose respective addresses are p1 6, p1 8 and p57 
- respectively corresponding to f 1 , f2 and f3 -, "0" is entered. 

[0161] And moreover, as lnstruction_2 includes a load operation and a store operation, their substances are not only 
55 entered into the OQ, but also delivered to the LSU. 

[0162] Thus, the computer system gets to be in such a state as shown in Fig. 15. 

[0163] Unexecuted operations held in the OQ are to be executed out of order - each after becoming executable. 
[0164] Fig. 1 6 shows the state that the computer system is in, at the point - after some cycles have gone by - when 
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all the instructions Issued before Instruction^ have been completed and all the four operations contained in 
Instruction^ have been terminated normally. As for each of the four OQ entries holding the substances of the operations 
contained in Instruction^ , each required operand data is already written in the operand field, the S-field content is "1", 
which stands for normal termination, and in the DF entry indicated in the destination-register field, data is already 
s written and WCF-field content is "1". And, contents of the CMF and PP-C at this point are the same as contents of the 
AMF and PP-A right before issuing of Instructional (Fig. 13). The system in this state is to complete Instruction^ in 
the following manner. 

[0165] As the top one of the eight OQ entries whose contents are shown in Fig. 16 is at the head of the queue at 
this point, four OQ entries ranging from that entry to the fourth entry, whose IB-field content is "1", are dequeued. As 
10 the top one of the two SMQ entries whose contents are shown in Fig. 16 is at the head of the queue at this point, a 
modification according with its substance is made on the CMF and PP-C. Namely, 2 is added to the value of PP-C, 
and p26 and p51 are entered in the CMF entries at and under the top of the stack grown by two entries, respectively. 
And moreover, the said head entry is dequeued on the SMQ. 

[0166] Besides, in accordance with completion of lnstructton_1 , each address of DF entry to be released from allo- 
cs cation is enqueued on the FL. In this case, p34 and p42 - each being a destination-register-field content in the dequeued 
OQ entries that is not contained in the dequeued SMQ entry - are registered on the FL. (In this case, there is no DF- 
entry address that is eliminated from the stack by the alterations on the CMF and PP-C in the course of completion of 
the instruction.) Thus, the computer system gets to be in such a state as shown in Fig. 17. 

[0167] Fig. 18 shows the state that the computer system is in, at the point - after some further cycles have gone by 
20 - when three operations contained in lnstruction_2 have been terminated normally and the store data and the store 
address have been both arranged for the last (store) operation. 

[0168] The system at the point of Fig. 18 is to complete lnstruction_2 in a cycle in the same manner as in the case 
of Instruction^ . In this case, lnstruction_2 includes a store operation, so the system requests the LSU to perform the 
store. Besides, in accordance with completion of the instruction , each address of DF entry to be released from allocation 
25 is enqueued on the FL, In this case, p51 and p26 - each being a DF-entry address that is eliminated from the stack by 
the alterations on the CMF and PP-C in the course of completion of the instruction -, and p16, p18 and p57 - each 
being a destination-register-field content in the dequeued OQ entries that is not contained in the dequeued SMQ entry 
- are registered on the FL. Thus, the computer system gets to be in such a state as shown in Fig. 19. 
[0169] Now, computation of Y=A*X+B/X is concluded in the computer system of this embodiment. 

30 

(5) Speculative Execution 

[0170] Speculative execution based on branch prediction can be incarnated in the computer system of the present 
invention. Furnishing with an AMF history file is for incarnation of speculative execution. Each time the instruction to 
35 be issued includes a conditional branch operation, contents of all the AMF entries and PP-A are supposed to be written 
into an AMF-history-file entry. In the following, how speculative execution based on branch prediction goes on in the 
computer system of this embodiment is described. 

[0171] In the computer system of this embodiment, in the instruction -issue stage, from in the initialized state and till 
the first instruction including a conditional branch operation is issued, branch tag "0" is marked on each issued instruc- 
40 tion and entered in the BT field of each OQ entry into which substance of an operation included in the instructions is 
entered. 

[0172] When the first instruction including a conditional branch operation is issued, to preserve the state at the branch 
point, contents of all the AMF entries and PP-A are written into the AMF-history-file entry of address 0. After this, in 
the course of the instruction flow grounded on the branch prediction, branch tag "1 " is marked and entered in BT fields 
45 oftheOQ. 

[0173] When the second instruction including a conditional branch operation is issued, in either the case where the 
first conditional branch is unsettled or the case where it has been settled and the prediction proved right, contents of 
all the AMF entries and PP-A are written into the AMF-history-file entry of address 1 . In the course of the instruction 
flow grounded on the second branch prediction, branch tag "2" is marked and entered in BT fields of the OQ. 

50 [0174] If the branch prediction system continues to make a hit, the process is to go on in the same manner, and 
writing into the AMF history file is to be carried out in the order of address. Besides, after writing into the AMF-history- 
file entry of address n, and till the next writing, branch tag n+1 is supposed to be marked on each issued instruction. 
[0175] In the case where a branch prediction turns out to have missed, all the instructions issued after the one 
including the conditional branch operation are to be canceled, respectively corresponding contents of the AMF history 

55 file that were written when the instruction including the conditional branch operation was issued are to be copied into 
the AMF entries whose respectively corresponding contents of the FFF are "0" and into PP-A, and the issue process 
is to be resumed from the instruction at the right place. Here, in the register mode, the content of each FFF-entry is 
supposed to be "0". Besides, in canceling all the instructions issued after a certain one, each address of DF entry that 
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has been allocated as destination register in the operations to be canceled is to be put back in the FL, and all the 
entries in the range that corresponds to the instructions to be canceled are to be dequeued both on the OQ and on 

[01 76^ As above, in the computer system of the present invention, as the substance of the AMF at each point when 
an instruction including a conditional branch operation was issued can be reconstructed by using the AMF history file, 
speculative execution based on branch prediction is feasible. 



(6) Other Embodiments 

10 [01 77] Computer systems according to the present invention are not confined to the above-mentioned embodiment. 
There may be various embodiments with different detailed structures. For example, the following can be listed. 



(6-1) Embodiment A 

is [0178] A computer system according to the present invention can be so structured that, with a program that is con- 
formed to the instruction format for a traditional register-based/stack-based processor stored in the instruction cache, 
the program is to be transformed on the fly right before the instruction-issue stage to conform to the instruction format 
prescribed in section (2). 



20 (6-2) Embodiment B 

[0179] A computer system according to the present invention can be so structured that, in the course of issue of 
each instruction, addresses of DF entries to be released from allocation when the instruction is completed or pieces 
of information for identifying them are entered in the state-modification queue or a specialized queue in advance. 

25 

(6-3) Embodiment C 

[0180] Such a computer system can be incarnated, as to be furnished with an advanced and a completed register- 
type mapping file, an advanced and a completed stack-type mapping file, two push pointers and a bottom pointer, 
30 wherein both substance of a modification on register-type mapping file and that on stack-type mapping file and push 
pointer are to be indicated in the state-modification field of the instruction. In this case, an advanced mapping history 
file having entries each of which is designed to hold contents of the register-type advanced mapping file, the stack- 
type advanced mapping file and its push pointer, needs to be provided. 



35 (6-4) Embodiment D 

[0181] Many of the variations such as those conceivable for the register-based superscalar architecture can be ap- 
plied to the computer system based on the present invention. For example, the following can be listed. 

40 1 . A system implementing the operation window by providing a reservation station at the input of each functional 

unit, and having a reorder buffer. 

2. A system having a data file, a free list, functional units, etc. for each data type: integer / floating point, 32bit/ 
64bit or the like. 

3. A system having a plurality of sets each comprising an advanced mapping file, a completed mapping file and a 
45 fill-flag file, and being so structured as to change sets in each context switching. 

4. A system having a plurality of sets each comprising an advanced mapping file, a completed mapping file, a fill- 
flag file, a state-modification queue, etc., and being so structured as to be able to process a plurality of threads in 
parallel. 



50 INDUSTRIAL UTILITY 



[0182] According to the present invention, high-performance computer systems can be materialized. Besides, pro- 
grams described in any machine language for traditional register-based/stack-based processors can be easily trans- 
formed to conform to the instruction format for the computer system of the present invention. 
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Claims 

1 . A computer system comprising: 

a data file having entries each of which is designed to hold data; 

an advanced mapping file and a completed mapping file each having entries each of which is designed to hold 
an entry address in said data file; 

an operation window that is a buffer to hold substances of operations waiting execution; 

a state-modification queue that is designed to be able to hold a substance of a modification on said advanced 

mapping file for each clock cycle; and 

functional units for executing operations; 

wherein making a modification on said advanced mapping file, entering the substance of said modification 
into said state-modification queue, and entering substances of operations combined with said modification on said 
advanced mapping file into said operation window are each to be done in one clock cycle; 

each u nexecuted operation held i n said operation window is to be executed out of order by one of said fu notional 
units; and 

after all of the operations combined with the modification on said advanced mapping file specified by the 
substance of the head part of said state-modification queue are terminated normally, a modification according 
with said substance of said head part of said state-modification queue is to be made on said completed mapping 
file, and said head part is to be dequeued from said state-modification queue. 

2. The computer system according to claim 1 ; 

wherein said advanced mapping file and said completed mapping file are each to be accessed by entry 
addresses. 

3. The computer system according to claim 1 ; 

wherein said advanced mapping file and said completed mapping file are each to hold a stack, and to be 
accessed by displacements from the top of the stack. 
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FIG.2 
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FIG. 5 
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